超越通用提示的進階

透過微調與專用架構實現優化

雖然「少樣本」提示是強大的起點，但擴展人工智慧解決方案時，通常需要進一步採用監督式微調。此過程將特定知識或行為直接嵌入模型的權重中。

決策關鍵： 您僅應在回應品質提升與令牌成本降低所帶來的效益，超過所需的計算資源與資料準備成本時，才進行微調。

$成本 = 令牌數 \times 單價$

小型語言模型（SLMs）是其大型對應模型的高效縮減版（例如：Phi-3.5、Mistral Small）。它們經過高度精選且高品質的資料訓練。

取捨考量： SLMs 提供顯著更低的延遲，並支援邊緣部署（在設備上本地運行），但相較於大型語言模型，它們會犧牲廣泛而通用的「類人」智能。

效率優先順序

應首先嘗試 提示工程 。若無效，再實作 RAG （檢索增強生成）。僅在最後階段作為高階優化手段使用微調。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

When does the course recommend proceeding with fine-tuning over prompt engineering?

When the benefits in quality and cost (reduced token usage) outweigh compute effort.

Whenever you need the model to sound more human-like.

As the very first step before trying RAG or prompt engineering.

Only when deploying to an edge device.

Question 2

Which model architecture allows scaling model size while maintaining computational efficiency?

Supervised Fine-Tuning (SFT)

Retrieval-Augmented Generation (RAG)

Mixture of Experts (MoE)

Multimodality

Challenge: Edge Deployment Strategy

Apply your knowledge to a real-world scenario.

You need to deploy a multilingual translation tool that runs locally on a laptop with limited GPU resources.

Task 1

Select the appropriate model family and tokenizer for this multilingual, low-resource task.

Solution:
Mistral NeMo with the Tekken Tokenizer. It is optimized for multilingual text and fits within SLM constraints.

Task 2

Define the deployment framework for high-performance local inference.

Solution:
Use ONNX Runtime or Ollama for local execution to maximize hardware acceleration on the laptop.